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Abstract: We consider a semiparametric convolution model. We observe 

random variables having a distribution given by the convolution of some 

- unknown density / and some partially known noise density g. In this work, 

^"•^ ' g is assumed exponentially smooth with stable law having unknown self- 

similarity index s. In order to ensure identifiability of the model, we re- 
strict our attention to polynomially smooth, Sobolev-type densities /, with 
smoothness parameter /3. In this context, we first provide a consistent esti- 
mation procedure for s. This estimator is then plugged-into three different 
procedures: estimation of the unknown density /, of the functional / /^ and 
goodness-of-fit test of the hypothesis Hq '■ f = fo, where the alternative Hi 
is expressed with respect to L2-norm (i.e. has the form 1/)^ ll/~/o|l2 ^ C). 
These procedures are adaptive with respect to both s and /3 and attain 

jr^ 1 the rates which are known optimal for known values of s and /3. As a by- 

5^ ' product, when the noise density is known and exponentially smooth our 

C^ ' testing procedure is optimal adaptive for testing Sobolev-type densities. 

The estimating procedure of s is illustrated on synthetic data. 
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1. Introduction 

Semiparametric convolution model 

Consider the semiparametric convolution model where the observed sam- 
ple {Yj^KjKn comes from the independent sum of independent and identically 
distributed (i.i.d.) random variables Xj with unknown density / and Fourier 
transform $^ and i.i.d. noise variables £j with known, only up to a parameter, 
density g and Fourier transform <i>^ 

Yj=Xj+ej, l<]<n. (1) 

The density of the observations is denoted by p and its Fourier transform <i>P. 
Note that we have p = f * g where * denotes the convolution product and 

We consider noise distributions whose Fourier transform does not vanish on M: 
$^(u) 7^ 0, V u G M. Typically, nonparamctric estimation in convolution models 
gives rise to the distinction of two different behaviours for the noise distribution: 
polynomially or exponentially smooth. In our setup, we focus on exponentially 
smooth noise where the noise density g may be known only partially. We thus 
assume an exponentially smooth (or supersmooth or exponential) noise 
having stable symmetric distribution with 

$9(m) = exp(-|7un, 7,s> 0. (2) 

The parameter s is called the self-similarity index of the noise density and we 
shall consider that it is unknown and belongs to a discrete grid Sn = {s = si < 
S2 < ■ ■ ■ < sn = s}, with a number N of points that may grow to infinity with 
the number n of observations (and < s < s < 2). The parameter 7 is a scale 
parameter and it is supposed known in our setting. Some classical examples of 
such noise densities include the Gaussian and the Cauchy distribution. 

The underlying unknown density / is always supposed to belong to Li n 
L2. For identifiability of the model, the unknown density must be less smooth 
than the noise. We shall restrict our attention to probability density functions 
belonging to some Sobolev class 

5 (/?, L) = |/ : M -. K+, y / = 1, ^ y 1$^ (u)f \uff^du < l\ , (3) 

for L a positive constant and some unknown /3 > 0. We assume that the un- 
known parameter f3 belongs to some known interval [/3, f3] C (0, -l-oo). We restrict 
this interval to (1/2, -|-cx)) in the case of pointwise estimation of the density /. 
Moreover, we must assume that / is not too smooth, i.e. its Fourier transform 
does not decay asymptotically faster than a known polynomial of order /?'. 

Assumption (A) There exists some known ^ > 0, such that |$^(-u)| > 
A|w|^'' for large enough \u\. 
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Note that when / belongs to S{/3, L) and assumption (A) is fulfiUed, we nec- 
essarily have /?' > /3+1/2. In the following, we use the notation g^'(M) = ^|m|~^ . 
Under assumptions (2) and (A) the model is identifiable. Indeed, considering 
the Fourier transforms, we get for all real numbers u 

log|$P(u)| ==log|$^(w)|-|7M|". 

Now assume that we have equality between two Fourier transforms of the like- 
lihoods $f = $f, where $?(«) = $/i(M)e-l^i"l°' and ^^(m) = $^2(u)e-l^^"l°' . 
Without loss of generality, we may assume si < S2- Then we get 



log|$-''i(u)|-7f = \u\-'Hog\<pf''{u)\ -7^ 



"'^ul 



and taking the limit when \u\ tends to infinity implies (with assumption (A)) 
that si = S2, 7i = 72 and then ^^^ ~ ^f^ which proves the identifiability of the 
model. 

In the sequel, probability and expectation with respect to the distribution 
of Yi , . . . , y„ induced by unknown density / and self-similarity index s will be 
denoted by P/_s and E/,s- 

Convolution models have been widely studied over the past two decades, 
mainly in a nonparametric setup where the noise density g is assumed to be 
entirely known. We will be interested here in a wider framework and will have 
to deal with the presence of a nuisance parameter s. We will focus on both esti- 
mation of the unknown density / and goodness-of-fit testing of the hypothesis 
Ho '■ f = foj with a particular interest in adaptive procedures. 

Assuming the noise distribution to be entirely known is not realistic in many 
situations. Thus, dealing with the case of not entirely known noise distribution 
is a crucial issue. Some approaches [13] rely on additional direct observations 
from the noise density, which are not always available. A major problem is that 
semiparametric convolution models do not always result in identifiable models. 
However, when the noise density is exponentially smooth and the unknown den- 
sity is restricted to be less smooth than the noise, semiparametric convolution 
models are identifiable and may be considered. 

The case of a Gaussian noise density with unknown variance 7 and unknown 
density / without Gaussian component has first been considered in [10]. She 
proposes an estimator of the parameter 7 which is then plugged in an esti- 
mator of the unknown density. Note that [12] also studied a framework where 
the variance of the errors is unknown. More generally, [3] consider errors with 
exponentially smooth stable noise distribution, with unknown scale parameter 
7 but known self-similarity index s. The unknown density / belongs either to 
a Sobolev class, or to a class of supersmooth densities with some parameter r 
such that r < s. Minimax rates of convergence are exhibited. In this context, 
the unknown parameter 7 acts as a real nuisance parameter as the rates of con- 
vergence for estimating the unknown density are slower compared to the case 
of known scale, those rates being nonetheless optimal in a minimax sense. 

Another attempt to remove knowledge on the noise density appears in [11]. 
The author proposes a deconvolution estimator associated to a procedure for 
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selecting the error density between the normal supersmooth density and the 
Laplace polynomially smooth density (both with fixed parameter values). Note 
that our procedure is more general as we encompass the case of only two different 
noise distributions and allow a number of unknown supersmooth distributions 
that may grow to infinity with the number of observations. 

Nonparametric goodness-of-fit testing has been extensively studied in the 
context of direct observations (namely a sample distributed from the density / 
to be tested) , but also for regression or in the Gaussian white noise model. We 
refer to [9] for an overview on the subject. The convolution model provides an 
interesting setup where observations may come from a signal observed through 
some noise. 

Nonparametric goodness-of-fit tests in convolution models were studied in [8] , 
[1] and [4], only in the case of entirely known noise distribution. The approach 
used in [1] is based on a minimax point of view combined with estimation of 
the quadratic functional J f^- Assuming the smoothness parameter of / to be 
known, the authors of [8] define a version of the Bickel- Rosenblatt test statistic 
and study its asymptotic distribution under the null hypothesis and under fixed 
and local alternatives, while [1] provides a different goodness-of-fit testing pro- 
cedure attaining the minimax rates of testing in various setups. The approach 
used in [1] is further developped in [4] to give adaptive procedures, with respect 
to the smoothness parameter of /, in the case of a polynomially smooth noise 
distribution. 

In our setup, we first propose an estimator of the self-similarity index s, 
which, plugged into kernel procedures, provides an adaptive estimator of the 
unknown density / with the same optimal rate of convergence as in the case 
of entirely known noise density. Using the estimator of s, we also construct an 
estimator of the quadratic functional J /^ (attaining the optimal adaptive rate 
of convergence) and L2 goodness-of-fit test statistic. Note that our procedure 
can only recover the index s on a size-increasing but discrete grid. 

Note that this work is very different from [3] as the self similarity index s plays 
a different role from the scale parameter 7 previously studied. Nevertheless, we 
conjecture that their procedure can be extended to recover simultaneously s and 
7 (when both parameters are unknown). However, optimal rates of convergence 
are even slower when 7 is unknown. 

Another consequence of our results is that when the noise density is known 
and exponentially smooth our testing procedure is adaptive for testing Sobolev- 
type densities, improving the previous results in [1]. 

Roadraap 

In Section 2, we provide a consistent estimation procedure for the self-similarity 
index. Then (Section 3) using a plug-in, we introduce a new kernel estimator 
of / where both the bandwidth and the kernel are data dependent. We also 
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introduce an estimator of the quadratic functional J /^ with sample dependent 
bandwidth and kernel. We prove that these two procedures attain the same rates 
of convergence as in the case of entirely known noise distribution, and are thus 
asymptotically optimal in the minimax sense. We also present a goodness-of-fit 
test on / in this setup. We prove that the testing rate is the same as in the 
case of entirely known noise distribution and thus asymptotically optimal in the 
minimax sense. Section 4 illustrates our estimation procedure for parameter s 
on synthetic data. Proofs are postponed to Section 5. 

2. Estimation of the self-similarity index s 

We first present a selection procedure s„ which asymptotically recovers the true 
value of the smoothness parameter s on a given discrete grid 

Sn^ {s_^ Si < S2 < ■■■ < SN = s}, 

where < s < s < 2 and with a number N of points that may grow to infinity 
with n, under additional assumptions (see Proposition 1). 

Without loss of generality, we assume that 7 = 1 in the following. Indeed, if 
known 7 is not equal to 1 then we divide the observations by 7 to get a noise 
with scale parameter 1. The asymptotic behavior of the Fourier transform $p 
of the observations is used to select the smoothness index s. More precisely, we 
have for any large enough \u\ 

A\u\-'^' exp(-|u|'') < |$P(u)| < exp(-|un. 

Let us now denote <^^^\u) = e^l"''' and Ik{u) the interval 

where g^/ is defined in Assumption (A). Let Un.k for fc = 1, . . . , n be some well- 
chosen points, as described later. Our estimation procedure uses the empirical 
estimator 



1 



n 



of the Fourier transform $p. We select all values of k belonging to 1, . . . , A^ such 
that $P(u„,fe) belongs to or is closest to the interval Ik{un,k)- Let then s„ be 
the smallest selected value of k, respectively si in case no k was selected. 
In other words, denote 5'„ C Sn the set constructed as follows, 

• Sfe e S'„ if 2 < A: < A - 1 and 

. si e Sn if m{un,i)\ > \ {9/3'$W + $['1} Km), 

. SN G Sn if m{Un,N)\ < \ [qp''^^'' '"^^ + ^'^^l} (u„.Af) , 
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where for each index fc, a sequence of positive real numbers {w„,fc}„>o has to be 
chosen later. If the set Sn is empty, we add si. The estimator is 

s„ = minS'„. (4) 

Note that taking the smallest value such that our condition on the closest inter- 
val is satisfied ensures that, with high probability, we do not over-estimate the 
true value s. Over-estimation of s has to be avoided and in some sense, is much 
worse than under-estimation. Indeed, deconvolution with an over-estimated value 
of s could result in unbounded estimation risk. 

The previous procedure is proven to be consistent, with an exponential rate 
of convergence, in the following proposition. 

Proposition 1. Under assumptions (2) and (A), consider the estimation pro- 
cedure given by (4) where 

Un,k=l—^ — loglogril 

where 6 > P' . The grid s~si<S2<---<SN~sis chosen such that 

\sk+i - Sk\ > dn ^ , with c> 2(3', N - I < {s - s)/d,,. 

logri 

Then, for any fee {1, . . . , N}, we have 

P/,..(s„ ^ su) < exp (^-^9jP'/%lognr^'-^'y%l + o{l))^ , (5) 

where A and f3' are defined in Assumption (A). 

3. Adaptive estimation and tests 

We now plug the preliminary estimator of s in the usual estimation and testing 
procedures for /. 

3.1. Density estimation 

Let us introduce the kernel deconvolution estimator Kn (see [5] for a recent sur- 
vey) built with a preliminary estimation of s plugged-into the usual expression. 
It is defined by its Fourier transform <(>^" , 

$^"H = expH^y"|l|„|<i, (6) 

, r flogn /? - g„ + 1/2 , y'/'" 

where /i„ = — : log log n I . (7) 
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Note that both the bandwidth sequence /i„ and the kernel Kn are random and 
depend on observations Yi, . . . , y„. Now, the estimator of / is given by 



U^)-:^±kJ^). (8) 



This estimation procedure is consistent and adaptively achieves the minimax 
rate of convergence when considering unknown densities / in the union of 
Sobolev baUs S{j3,L) with (3 G [P,P] C (1/2; +oo) and unknown smoothness 
parameter for the noise density s in a discrete grid S'„. 

Corollary 1. Under assumptions (2) and (A), for any f3 > (3 > 1/2, the 
estimation procedure given by (8) which uses estimator s„ defined by (4) with 
parameter values: {w„,fc} given by Proposition 1, S > /S' + s^/(2s), (i„ > clogn 
and c > 2/3', satisfies, for any real number x, 

hm sup sup sup sup {\ogn)^'^^^^^l''¥.f^s\fn{x) - !{x)\'^ < oo. 

ri->oo sGS„/3e[^J]/e5(,9,L) 

Moreover, this rate of convergence is asymptotically optimal adaptive. 

Remark 1. This result is obtained by using that, with high probability, the 
estimator Sn is equal to the true value s on the grid (see Proposition 1). 

Note that the optimality of this procedure is a direct consequence of a result 
by [ti] where he considers the convolution model for circular data with (3 and s 
fixed and known. This result confirms the results of [2] for adaptive estimation 
of linear functionals in the convolution model and known parameter s. Therefore 
we may say that there is no loss due to adaptation neither with respect to /3 
nor to s. 

Note also that by similar calculations we get that the adaptive estimator /„ 
attains the rate (log n)^^/* over Holder classes of probability density functions 
of smoothness /?, for the mean squared error (pointwise risk). 

Moreover, it can be shown that the mean integrated squared error of the 
adaptive estimator /„ converges at the rate (logn)^'^/'* over either Sobolev or 
Holder classes of functions. In [7], lower bounds of the same order were proven 
over Holder classes of density functions /. 

3.2. Goodness-of-fit test 

In the sequel, j| • ||2 denotes the L2-norm, M is the complex conjugate of M and 
< M, N >— J M{x)N{x)dx is the scalar product of complex-valued functions 
in L2(M). From now, we consider again that [/3,^] C (0, +00). 

For a given density /o in the class 5(/3o, Lq), we want to test the hypothesis 

i^o : / - /o 
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from observations Yi, . . . , y„ given by (1). We extend the results of [I] by giving 
the family of sequences *!'„ = {'/'n./slflgM ffl which separates (with respect to 
L2-norni) the null hypothesis from a larger alternative 

H,{C,-^n) : / e U^e[£j]{/ ^ S{P,L) and ^"^H/ - h\\l > C}. 

Let us first remark that as we use noisy observations (and unlike what happens 
with direct observations), this test cannot be reduced to testing uniformity of 
the distribution density of the observed sample (i.e. /o = 1 with support on the 
finite interval [0; 1]). 

We recall that the usual procedure is to construct, for any < e < 1, a test 
statistic A* (an arbitrary function, with values in {0, 1}, which is measurable 
with respect to Yi, . . . ,Yn and such that we accept Hq if A* = and reject it 
otherwise) for which there exists some C° > such that 



hmsup sup <^ P/„..[A* = 1] + sup P/,,[A* = 0] ^ < e, (9) 

Ti-»oo s£S„ [ /eHi(c,*„) J 

holds for all €>€'•*. This part is called the upper bound of the testing rate. 
Then, prove the minimax optimality of this procedure, i.e. the lower bound 



liminfinf sup <^P/(,s[A„ = 1]+ sup P/ ,[A„ = 0] ^ > e, (10) 

n^oo A„ sg5^ [^ ' /e-H"i(C,*„) J 

for some Cq > and for all < C < Co, where the infimum is taken over all test 
statistics A„. 

An additional assumption (T) used in [1] on the tail behaviour of /o (ensuring 
it does not vanish arbitrarily fast) is needed to obtain the optimality result, 
which is in fact a consequence of [1]. We recall this assumption here for reader's 
convenience. 

Assumption (T) 

3co>0,Vx-gIR,/o(.t)> ^"^ 



1 + |.t|2 

Remark 2. Similar results may be obtained under the more general assumption: 
there exists some p > 1 such that fo{x) is bounded from below by cq{1 + \x\p)~'^ 
for large enough x. 

Now, the first step is to construct an estimator of / /^. Using the same kernel 
estimator (6) and the same random bandwidth (7), we define 






''-^iT.i:2;<r'^M^ .M.Rf- >■ (") 



Corollary 2. Under assumptions (2) and (A), for any (3 > (3 > Q, the es- 
timation procedure given by (11) which uses estimator Sn defined by (4) with 
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parameter values: {un^k} given by Proposition 1, 6 > f3' + s^/(2s), dn > clogn 
and c > 2(3' , satisfies, 



limsup sup sup sup (logn) '* < E/ 

n^oo sGS„;3e[/3j]/G5(/3,L) I 



2^1/2 
2/3/; 



Tn- I f 



< oo. 



Moreover, this rate of convergence is asymptotically adaptive optimal. 

The rate of convergence of this procedure is the same as in the case of known 
self-similarity index s and known smoothness parameter /3. It is thus asymptot- 
ically optimal adaptive according to results obtained by [1]. 

Let us now define, for any /g G S{f3, L), 

n = -j^. EE < T^^« (^) -/o ' f ^"" (^) -^° > ■ 

(12) 
This statistic is used for goodness-of-fit testing of the hypothesis Hq versus Hi . 
The test is constructed as usual 

I otherwise, 

for some constant C* > and a random threshold t\ to be specified. 
For computational facilities, we may write using Plancherel formula 

-^ n 

2 x-^x-^ 1 



I) E E ^ < '^'^" (•^") ^"'^'' - *^° ' *'^" (-^'O ^'"'^' - ^ > 



nfri — 1) ' 

^ ' l<k<j<n 



Tinln — 1) -^ 

^ ' l<k<j<n 

Corollary 3. Under assumptions (2) anii (A), for any < (3 < (3, any L > 

and for any fg € 5(/3, L), consider the testing procedure given by (13) which uses 
the test statistic (12) with estimator s„ defined by (4) with parameter values: 
{un.k} given by Proposition 1, 6 > P' + s^ /{2s), dn > clogn and c > 2(3' , with 
random threshold and (slightly modified) random bandwidth 

/1 \ -2/3/s„ /, f,5 \ -1/sn 

^2 (\ogn\ ' - /logn 2(3. ^ 



<-{-^} ; /.„=^^-^iogiog. 

anrf an?/ Zar(7e enough positive constant C* . This testing procedure satisfies (9) 
for any e S (0, 1) wzi/i testing rate 



*n = {V'ru/al/jel/sj] 5wen &?/ i/'n,/3 



logn^ 
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Moreover, if /o €; S{(3, cL) for some < c < 1 and if Assumption (T) holds, 
then this testing rate is asymptotically adaptive optimal over the family of classes 
{5(/3,L),/3 G [/3;^]} and for any s € Sn (i.e. (10) holds). 

Adaptive optimality (namely (10)) of this testing procedure directly follows 
from [1] as there is no loss due to adaptation to (3 nor to s. Note also that the 
case of known s and adaptation only with respect to (3 is included in our results 
and is entirely new. 

4. Simulations 

In this section, we illustrate some of our results on synthetic data. We consider 
two different signal densities: the density of the sum of 5 independent Laplace 
random variables, Laplace{5) (having standard deviation ylO) and a Gamma 
distribution with parameters (3/2, 1/2) or x| (with standard deviation V6), as 
described in Table 1. 

The noise densities were selected among 4 different exponentially smooth 
distributions as described in Table 2. 

The simulation of random variables having Fourier transform $[*''^l(w) and 
4)[i-5](y) ig based on [14]. We thus simulated 8 different samples each one con- 
taining n observations, where n ranges from {500; 1000; 2000; 5000}. We used 
a scale 0.1 on the signal density in order to have a small signal-to-noise ratio 
(defined as the ratio of the standard deviations of the signal with respect to 
that of the noise). Note that the noise has finite standard deviation only for 
s = 2 and it equals \/2. In this case, the signal-to-noise ratio equals 0.22 when 
the signal has Laplace density and 0.17 for the Gamma distribution. 

We then performed selection of s on the finite grid Sn = {0.5, 1, 1.5, 2}. The 
points Un.k were chosen independently of the size n of the sample. The choice is 
based both on theoretical grounds and on a previous simulation study. We fixed 
the following values Un,i = 2.5]Un,2 = 1-7; u„. 3 — 1.5;Un,4 = 1.45. For each 
sample and each sample size, we performed m ~ 100 iteration of the procedure 



Table 1 
Signal densities 



Signal density 


Fourier transform 


Laplace{5) 


$i(n) = (l + n2)-!5 


Gamma( |,i) 


<S>a(u) = (l-2m)-3/2 



Table 2 
Noise densities 



Noise stable density 


Fourier transform 


5 = 0.5 


$[0-51(«) = exp{-|«|l/2) 


s = 1 


<I>[11(«) = cxp{-|u|) 


s = 1.5 


#11-51(«)=cxp(-|m|1-5) 


s = 2 


$[21(«) =exp(-|«|2) 
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Table 3 
Number of successes (sn = s) for 100 iterations of the procedure, when the signal density is 

Laplace 





n = 500 


n = 1000 


n = 2000 


n = 5000 


s = 0.5 


85 


93 


98 


100 


s = 1 


66 


87 


95 


100 


s = 1.5 


65 


82 


93 


100 


s = 2 


73 


90 


93 


99 



Table 4 
Number of successes (sn = s) for 100 iterations of the procedure, when the signal density is 

Gamma 





n = 500 


n = 1000 


n = 2000 


n = 5000 


s = 0.5 


94 


99 


100 


100 


s = 1 


71 


88 


98 


100 


5 = 1.5 


91 


98 


100 


100 


s = 2 


69 


79 


84 


98 



and the results are presented in Table 3 for the Laplace signal density and in 4 
for the Gamma signal density. 

We naturally observe that increasing the number of observations improves 
the performance of the procedure, with almost perfect results when n = 5000. 
However, the results obtained with small sample sizes (n = 500) are already 
encouraging (more than 65% of success). 

In the case where the true parameter s does not belong to the grid, we 
observed that the procedure recovers the value of the grid which is closest to s. 

5. Proofs 

We use C to denote an absolute constant whose values may change along the 
lines. 

Proof of Proposition 1. We fix the true value s = Sk in the grid. Recall that the 
size of the grid is at most given by the step dn = c(logn)~-'^. We want to control 

Vf,s,iSn i- Sk) = P/„s,(s„ > Sfe) +P/,,,(S„ < Sfc). 

The overestimation case, namely s„ > Sfe, is the simplest to deal with. By 
definition of s„ , we have. 



1 



+ P/,.. (\K{y'^x)\ > i{<7/3'$['-^i + $['i}K,.)) . (14) 

Considering the first term in the right hand side of the previous inequality, and 
using that |$''(u„,fc)| > g/3'("n,fc)^''''("n,fc), we can write 
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< P/... 



< 



Wc will often use the following lemma. 

Lemma 1. For any j G {1, . . . , TV — 1} and k E {2, . . . , TV}, we have 

\ {g^,$b1 _ $b-+il} («„^^.) > ig^,(u„,,)$blK,,)(l + 0(1)), 
and ^{g/3'$l''^-^l-$W}K,fe) > i$W(^i„^fe)(l + o(l)). 

Proof of Lemma 1. By using both that s_,+i — Sj > dn and d„ log(u„j) —^ 0, we 
have 

i{g/3'$[^l-<i>[-'+'l}K,,) 

> -$[^l(lt„j)Q/3'("nj) (l - ^"^^nj ^Xp [lCj(l " exp(d„ logM„j))] | 
= ^'^^'Hunj)q0'{Unj) {l - ^"'wjj CXp [ - <^^.d„ l0gU„j(l + o(l))] } 

= i$[^l (u„ j)<?0' (u„ j) { 1 - A-1 exp [(-c/2 + /3') log(u„,,)(l + o(l))] } 
= i$[^lKj)9/3'K,.)(l + o(l)), 
as soon as — c/2 + /3' < 0, i.e. c > 2(3' . Similarly, we have 

i{g,3'<i>[^-^l-$[^l}K.,) 



= -$'^'(Wnj) {<7;3'("nj)exp«^j - M,?'/) - l} 



> -$'^'(w«j) {Aexp(w^^'^d„ logw„j - P' log(u„j)) - 1} 
= ^<I'[^iK,,){Acxp((c/2-/3')logK,,)(l + o(l))) -1} 

> i$[^lK,,)(l + o(l)), 

as soon as c > 2/3'. This ends the proof of the lemma. D 
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Using the first result of this lemma, combined with Hoeffding inequality, we 
obtain 



(71 
- ^g^'K.fc) exp(-2<';J(l + o(l)) 

< expf-— 22^/^^(logn)^^(l + o(l))) 



Similarly, by using the bound |$^(un.fc)| < $f'^i(w„,fc), and the second result of 
Lemma 1, the second term in the right hand side of (14) satisfies 



< P/,,, {mun,k) - $''K,fc)l > \ {<Z/.'$''-'l - $['!} K,fc)) 

<P/,,,(|4^(u„,,)-<i>PK.fc)l > i$WK.fc)(i + o(i)) 

Hoeffding inequality leads to 

<exp('-i(logn)2*/-^''(l + o(l)) 
Finally, the overestimation probability (14) is bounded by 

,(s„>Sfc)<expf-^22^'/^.(logn)^^(l + o(l)) 



Let us now consider the probability of underestimation. The case s„ = s\ has 
to be dealt with separately as it may occur from cmptyness of the set 5„. By 
using the definition of s„, we have 
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As I^^K.j)! < *n"«j) = ^f^'Kj) < $t^+ilKj), we get 
Now, using Lcmnia 1 again and the HocfFding inequality 



fe-1 
< 

J=2 
fc-1 
< 

i=2 



^P/,., (|I'^K.,)-*^K,.)I > ^$[^'iK,.)90'K,.)(i + o(i)) 

J=2 ^ 

fc-1 

i=2 

< 7Vexp(-^22'''/'*.(iogn)^^^(l + o(l)) 



< expf- — 22^/^(logn)^T-^(l + o(l)) 

as Sj < s and N — O(logn). 

The case s„ = si can now be easily handled. Indeed, let us denote by £j the 
event 

^. = {l*f.K,i)| > 1/2 (g/3'<I>[^l + <i>t^+^l) K,,)} . 

Now, if Sn = si, then either the event £i happens, or all of the £jS don't and 
thus in particular, £k is not satisfied. Thus, 

The probability of £^ has already been controlled (overestimation probability). 
Let us consider the probability of the first event. As previously seen, using 
Lemma 1 and Hoeffding inequality, 

< Ff^s, {mun,l) ~ *P(U„.1)| > \ {<Z/.'$[^1 + $[21} K,i) - |$PKa)| 

< P/,., (^|$j;Ka)-$^(u„4)l > i<i>WK,iVKa)(l + o(l)) 

- '^''P (-59?i'("«.i) exp(-20) 

< expf-— 22^/^i(logn)^^(l + o(I))j. 
Thus, the probability of underestimation is bounded by 

P/,.,(5„ < Sk) < exp ('~^22^'/^-(iog„)2(*--/3')A-(i + o(i)) 
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and gathering the results concerning overcstimation and underestimation, we 
get 

P/..(^„ ^ s,) < exp (-^2'^'/^\ognr('-^'y^il + o(l))) 

D 

Proof of Corollary 1. Let the true value of the parameter be some fixed point 
Sfc on the grid. We introduce respectively, /i„, the non-random version of the 
bandwidth /i„ and Kn the non-random version of the kernel Kn both constructed 
with true self-similarity index Sk- The Fourier transform <i>^" of A'„ thus satisfies 

*^"H = exp|(^H)"|l|„|<i, 

(\ogn /3--Sfc + l/2 , ^-'/^^ 

where /i„ = log log n 

\ 2 Sk 

We also introduce the corresponding (classical) estimator 

i—l 

which corresponds to the case of entirely known noise distribution. Note that 
obviously, Sk, Kn and /i„ are unknown to the statistician. These objects are used 
only as tools to assess the convergence of the procedure. Now, remark that we 
have 

lE/..JI/«(^) - f{^)\^] = ^f,sd\fn{x) - f{x)\H,„=,,] 

say. Let us focus on the first term 

Ti < Ef.sAlfnix) - f{x)\'] = {E/,,J/„(x-)] - /(x)}2 + Var/,,J/„(x)}, 

introducing the bias and the variance of the estimator /„ (x) . By using classical 
results on this estimator, we have 

2(sfc-l) 



T,<o(e~^)+or" "-p(^/^"^) 



Now, we prove that the second term T2 is negligible in front of the main term 
Ti, by using Proposition 1 and uniform bounds on |/„(a;)| and |/(a;)|. First, 



|/„(x)| < J eJ*\'i^,^^,^f^jt^oCK-'cMyK}) 

< 0(l)(log?7yi-"")/^exp{(logn)""/^} 
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and also 

\f{x)\ < J i^^mdt = 0(1(1 + \tn-'dt) ^ 0(1), 

leading to 

T2 = 0((logn)2(i-^)/^exp{2(logn)^"/^})P/,,,(s„ ^ Sk) 
= O ('(logn)2(i-^")/^cxp f2{logny/^ - ^2^f^'^'{lognf^^~P">/%l + o(l)) 

As soon as we choose 2{S — j3')/s > s/s_, this second term T2 will be negligible 
in front of Ti . In conclusion, 

E,,ji/„(x)-/(x)p] = o(fer^)+ofa---^) ""p^'/^"^ 

\ n 

= 0((logn)-(2/3-i)/-.). 

D 

Proof of Corollary 2. We keep on with the same notations as in the proof of 
Corollary 1 and denote by / the functional / /^ and by T„ the estimator using 
deterministic parameters Sk,K„ and h„. In the same way as in the proof of 
Corollary 1, we write 

E/,.J|r„ - If] < E/,sJ|r„ - /|2] +E/,,,[|f„ - /|'1{,„^.,}]. (15) 

Let us first focus on the first term appearing in the right hand side of (15). We 
split it into the square of a bias term plus a variance term. The bias is bounded 

by 

|E/..,r„-/|<0((logn)-2/'A^). 
Concerning the variance term, we easily get 

Var/.,,(T„) < ^h^;^^-^exp{A/h-:^) + ^hf+^--'eM'^/K-), 

where Oi and C2 are positive constants (we refer to [1], Theorem 4 for more 
details). Using the form of the bandwidth /i„, we have 



i2 ^ / log'^ 



5/,,jr„ - /|^ = o 

Let us now focus on the second term appearing in the right hand side of (15). 
Denoting by ft-o = (logn/2)~^/-, we have 

|f„| < -^ / eMm')du - 0{hl-^ cxp{2/ht,)). 

^TT J\u\<l/ho 



Moreover, 
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This leads to 

and this term is negligible in front of the first term appearing in the right hand 
side of (15) as soon as 2{5 — (3')/s > s/s. This leads to the result. D 

Proof of Corollary 3. Wc use the same notations as in the proof of Corollaries 1 
and 2. Moreover, T° is the test statistic constructed with the deterministic kernel 
Kn and the deterministic bandwidth /i„; and t^ is the threshold defined with 
the true parameter value Sk for the self-similarity index. The first type error of 
the test is controlled by 

P/„,,,(A; = 1) = P/o,.J|fO|t-2 > C*) < P/o,.Js„ ^ Sk)+Ff,,.,,{\T°\t-^ > n. 

The first term on the right hand side of this inequality converges to zero ac- 
cording to Proposition 1. Moreover, Theorem 4 in [1], shows that 

/,Sfc-l u2P+Sk-l 

Var/„,,,(T°) < 0(1)^V ^M^/K') + 0{l)^ exp(2/;i^^). 

Finally, we get 

1 



^fo,sA\Ty~'>n<j^ 



{C*)Hi 



( - h^k-l U20+Sk-1 1 

X 0{htf) + 0(1)^V ^M^/K') + 0(1)^ cxp(2//j:'=) \ , 

which is actually 0{l)/C* . Choosing C* large enough achieves the control of the 
first error term. 

We now turn to the second type error term. Under hypothesis _ffi(C,^„), 
there exists some /3 such that / belongs to S{/3, L) and ||/ — /0II2 > Ctpn,0- We 
write 

P/.,,(A* - 0) = P/,.,(|r0|t-2 < en < P/,,J,s„ ^ .sfe) +P/,,J|T0|t-2 < C*). 

As already seen, the first term in the right hand side of this inequality converges 
to zero, so we only deal with the second one. We define Bf g^ {T^) = E/ g^T^ — 
||/-/o|ll.Thus 

P/....(|T°|i-2 <n < Ff,s,{\T^-Ef,sX\ > \\f-fo\\l-CHl + Bf,,,{T^)) 
< ^-^Ls.JT^i) (,g) 
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According to [1], we have 

i?/,..(T„")<Ci/if 

where Ci > is a constant depending only on L and on the noise distribution. 
Under hypothesis i?i(C, *„), we also have ||/ — /0II2 > C^-'n b- Thus, 

\\.f-h\\l-CHl + Bf^,,{T^^) 



lognV '■'' n^f^O^-n^Y" r. ^log 



> c ^- - CM ^- - Ci 



> a 



2 / V 2 y V 2 

log 71^ -2/3/sfc 



where a = C — C* — Ci is positive whenever C > C° := C* — Ci. Returning to 
(16), we get 

P/,..(|r°|t-^)<%^Var;,,,(r°). 

Computation of the variance follows the same lines as under hypothesis Hq. We 
obtain 

Var,,jr°) < 0(l)^exp(2//.-) (^/.f + H^^EM^) . 

The choice of the bandwidth ensures that the second type error term converges 
to zero. D 
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